Synthetic-to-real: instance segmentation of clinical cluster cells with unlabeled synthetic training

Abstract Motivation The presence of tumor cell clusters in pleural effusion may be a signal of cancer metastasis. The instance segmentation of single cell from cell clusters plays a pivotal role in cluster cell analysis. However, current cell segmentation methods perform poorly for cluster cells due to the overlapping/touching characters of clusters, multiple instance properties of cells, and the poor generalization ability of the models. Results In this article, we propose a contour constraint instance segmentation framework (CC framework) for cluster cells based on a cluster cell combination enhancement module. The framework can accurately locate each instance from cluster cells and realize high-precision contour segmentation under a few samples. Specifically, we propose the contour attention constraint module to alleviate over- and under-segmentation among individual cell-instance boundaries. In addition, to evaluate the framework, we construct a pleural effusion cluster cell dataset including 197 high-quality samples. The quantitative results show that the numeric result of APmask is > 90%, a more than 10% increase compared with state-of-the-art semantic segmentation algorithms. From the qualitative results, we can observe that our method rarely has segmentation errors.


Introduction
The comprehensive segmentation of cells are core analysis steps in many histopathology image analysis tasks (Bazgir et al., 2021;Gurcan et al., 2017;Saltz et al., 2017;Shang et al., 2021). A number of studies have been performed in the segmentation of cells. Genc¸tav et al. (2012) and Song et al. (2017) propose a framework based on prior knowledge and automatic threshold to gradually separate cervical cell clump from the background, cell nuclei and cytoplasm boundary from the cervical cell regions. Unet (Yang et al., 2017) and its variants (Isensee et al., 2021;Zhao et al., 2019b) use skip connection to integrate high-level semantic and low-level finegrained texture information to improve epithelial nucleus segmentation quality (Yi et al. 2019a;Zhou et al., 2019b) joint detector (SSD, Fastercnn) and segmenter (FCN, Unet) to achieve neural cell and cervical cell instance segmentation. However, the above methods perform poorly in tumor cluster cells, such as Pleural effusion tumor cluster cells (Sarioglu et al., 2015;Win et al., 2017) illustrated in Figure 1. There are three main reasons for these problems: (i) Model generalization. Due to the cluster cell morphology being varied and the segmentation methods (Genc¸tav et al., 2012;Song et al., 2017) rely heavily on cells' prior knowledge. Therefore, these methods only using fixed prior knowledge perform poorly in new miscellaneous cells segmentation tasks. (ii) Multiple instances properties. The cell pixels in the overlapping or adhesion area belong to multiple instances. Some semantic segmentation methods such as Isensee et al. (2021), Yang et al. (2017) and Zhao et al. (2019b) can only define that cell pixels in the overlapping the area belongs to one instance, which often lead to error distinguishment of cell pixels. (iii) Cluster properties. Due to the distortion, adhesion, overlap between cells, the cell contour is blurred and the contrast is low. It is not sufficient for Yi et al. (2019a) and Zhou et al. (2019b) to use mask only to regress the cell obscure boundary. Therefore, these methods often lead to over-, undersegmentation, false and missed detection of cell pixels. In addition, labeling cluster cell with blurred outlines requires the professional guidance of several pathologists, which is a time-consuming and laborious process.
Overall, how to high-quality segment cluster cells is still a significant challenge. Therefore, we propose a contour constraint instance segmentation framework (CC framework) without prior knowledge based on cluster cell combination enhancement (CCCE). First, to alleviate the demand for a large amount of data in the network framework, we constructed a data enhancement module CCCE, which can enrich cluster cell information through the small number of discrete cells. Secondly, to avoid false detection and missed detection in intertwined complex cell regions, inspired by keypoints detection (KD; Yi et al., 2019b), detect branches of CC framework (KD module) outputs top-left, top-right, bottom-left, bottom-right and the center points of a cell on multiple scales. Then, each cell rectangle can be generated by three points or any two diagonal points in five points, which can distinguish multiple instances of a cell of pixel and further improve the detection accuracy and avoid missed detection. Finally, in the cell segmentation branch, to prevent over-or under-segmentation in the fuzzy region of ROI multi-instance cells, we fuse the deep to the shallow features to recover the boundary information and construct the contour attention constraint (CAC) module to constrain the cell boundaries. The effect of constraint boundary can also further enhance the expression ability of key points when network parameters are updated by backpropagation.
In summary, our main contributions are given as follows.
1. We construct a pleural effusion cluster cell dataset including three categories: Discrete pleural effusion cells (68 labeled images), which are used to synthesize aggregated pleural effusion cells and labels; Synthetic pleural effusion cluster cells, which consist of 200 samples as the training set in our framework; Real pleural effusion cluster cells, which consist of 129 labeled images as the testing set in our framework. This dataset provides great challenges in medical image segmentation, such as overlapping, touching, low contrast and complex background. 2. We propose a cluster cell instance segmentation framework that only requires unlabeled synthetic dataset as the training set and achieves an high-quality cell boundaries instance segmentation close to fully supervised methods on real pleural effusion tumor cluster cell dataset. The framework includes three modules: CCCE, KD and contour attention constraint segmentation (CACS). The CCCE module can simulate the clusters cells image by using only a few discrete cells images, which greatly reduces the dependence of the network on the dataset. The KD and CACS module can be applied to various medical segmentation images, especially the cluster cell images of complex scenes. 3. We compare with five state-of-the-art (SOTA) algorithms on two datasets. Our algorithm achieves a 13% increase compared with other segmentation algorithms and attains >97% on metric AP mask (averaged over Mask IoU thresholds). In metric AP boundary TP (averaged over Boundary IoU thresholds), the framework also achieves comparable results.

Related work
1.1.1 Image data enhancement Data augmentation enhances the size and quality of training datasets that are widely used in deep learning networks. For example, some scholars have proposed regularization methods [Dropout (Srivastava et al., 2014), Cutout (DeVries and Taylor, 2017), Mixup (Zhang et al., 2017) and Gridmask (Chen et al., 2020a)] to solve the over-fitting phenomenon. Unfortunately, these data enhancement methods are not practical in medical images. Because collecting medical images [such as computerized tomography (CT), tumor cells] is a time-consuming and laborious process, especially in the case of disease scarcity, patient privacy, and pathologist guidance. Therefore, increasing the sample size of medical images and improving the quality of medical images are essential for medical tasks. Zhao et al. (2019a) learned labeled example feature to synthesize unlabeled examples by building a model of transformations to increase the number of samples. Wolterink et al. (2017) used low-dose CT and routine-dose CT as data input to train an adversarial discriminator through a GAN network to evaluate routine-dose CT and reduce the noise of low-dose CT. In general, the GAN network relies on a large number of datasets and only can be applied to lowresolution images. To solve this problem, Baur et al. (2018) synthesized high-resolution skin pathology images with DDGAN under a small skin disease training sample.
Although their data enhancement methods have achieved remarkable results in medical images, they are still unsuitable for cluster cell data. There are two main reasons. First, they need a certain amount of data for labeling cluster cells, which inevitably faces the problem of difficulty in labeling cluster cells. Second, these data enhancement methods rely on network training, which will lead to the failure of synthetic data due to the instability and poor generalization of the network. When compared with these methods, we can synthesize cluster cells only with a small number of discrete labeled cells. The synthesized cluster cells have rich information of overlap, adhesion and noise, which can effectively fit the real data and alleviate the over-fitting phenomenon. In addition, the process is simple and effective without adding additional network training.

Cluster cell overlapping occlusion segmentation
Overlapping occlusion phenomena are widespread. To deal with the high overlap phenomenon, Ke et al. (2021) and Lazarow et al. (2020) constructed the occluder and the occluded module based on the occlusion relationship. This module makes full use of interactive information through the relationship between two instances and achieved high-performance results on coco and cityscapes panoptical segmentation. In contrast, these phenomena also exist in medical images. Some scholars apply prior knowledge to cluster cell overlapping occlusion segmentation. Kong et al. (2011) calculated the segmentation boundary according to the concave point distance between overlapping instances. Song et al. (2018) designed an energy function for the fragment information of cluster cells, which can provide geometric information in overlapping contour segmentation. Although these methods have made some progress in cervical cancer cluster cell segmentation by using boundary, shape and geometric information, they perform poorly in complex situations. The key to this problem is that the network must have robust feature extraction ability, not a single information expression. In this problem, deep learning shows the powerful performance. Paulauskaite-Taraseviciene et al. (2019) used two different conceptual models of Unet and Mask region-based convolutional neural network (R-CNN) to jointly segment overlapping cluster cells and analyzed the performance of the deep learning method in this regard. Yi et al. (2019a) and Zhou et al. (2019b) used a two-stage instance segmentation network to extract the bounding boxes of multi-instance cells of the same, and performs contour segmentation for instance cells.
Different from the above methods, our anchor free KD can fully express intensity (or color) information and shape heterogeneity in the overlapping region, which makes our framework has strong robustness to complex textures and highly overlapping cells without any prior shape information.

Cluster cells densely adhered segmentation
In the task of fine cell segmentation, an ongoing challenge is to segment densely contacted squeezed deformed cells and outline blurred cell boundaries. To solve this problem, Liu et al. (2019) added a dense connected conditional random field on a lightweight network to improve the segmentation precision. Graham et al. (2019) encoded the distance from the nucleus pixels to the centroid, and the distance information can assist in the accurate segmentation of overlapping instances. Chen et al. (2020b) designed a two-stage finegrained segmentation network from coarse to fine: in the first stage, a network similar to the contour-aware informative aggregation mask net Zhou et al. (2019a) was used to obtain each instance in the cluster cells. In the second stage, the instance cells were refined through up-down sampling and residuals. Similarly, Fan et al. (2020) constructed an attention map to learn each instance end-toend, it can effectively suppress the background and improve the recognition ability of overlapping instances.
However, it is worth mentioning that the above algorithms only are designed for feature extraction of fine information, which easily shapes, overlap, adhesion, obscure contour, low contrast, deformation, background impurity, which is very easy to lead to false detection, missed detection, over-and under-segmentation leads to over-or under-segmentation in case of low overlapping boundary gradient and heterogeneous cluster cell shape. So, we propose a CAC module to constrain the cell boundaries, which can improve the refine the ability to overlap adhesion boundaries to prevent over-segmentation or under-segmentation.

Methodology
An overview of the CC Framework for cluster cells segmentation is shown in Figure 2. The framework comprises three parts: cluster cells combination enhance module (CCCE) to enhance clusters attributes as input data, the KD module implements the KD-based scheme to obtain instance proposals, following cluster cells segmentation by CACS module.

CCCE module
Different from cutout, cutmix and gridmask, CCCE is a simple and efficient data enhancement strategy. As shown in the left part of Figure 2, we select a very small number of data images from the pleural effusion tumor cell cluster dataset as training samples, which only contain a few cells. Specifically, we randomly select an initial image and N images from the training samples. Then gradually crop the area of the cell in N images, following paste the area of the cell randomly onto the initial image to generate the final cluster cell image. Let x 0 2 R WÂHÂC and x t 2 R WÂHÂC denote the initial training image and the randomly selected training image.x represents synthesized cells image by CCCE module, and the generation process of the CCCE module is illustrated in the Algorithm 1. Assuming that the x t contains K cell x k , each cell x k can increase the sample diversity through various affine transformation methods such as random rotation, color jitter and Gaussian noise; A is the operation of obtaining cell coordinates. Base_coord, Crop_coord and Overlap_Coord represent the coordinates of x 0 ; x k and overlapping regions, respectively. These coordinate points 2 f0; 1g WÂH denote a binary mask; , is element-wise multiplication and addition. In our experiment, all training sample pixels are normalized to (0,1), which will lead to the pixel value of the overlapping area may be more than 1, and the visualization effect of this area does not conform to the real overlapping images, so we divide the overlapping area @ (1.8, 2.2, 2.5) to approximate the real images. Besides, the foreground area th will be set to control the continuation and termination of cell synthesis, which can effectively evade the imbalance of positive and negative samples.

KD module
Our CC framework uses Resnet50 as the Backbone. To avoid missed and false detection, we detect five keypoints with embedding vector as the bounding boxes of cells by Keypoints generation (KG)-the top-left, top-right, bottom-left, bottom-right and the center points. These key points are grouped into cells bounding boxes using Bounding boxes grouping (BBG). The flowchart is shown in Figure 3 (KD module).

Keypoints generation
Considering the multi-scale cells, we fuse the features of the backbone C0-C3 layer as to the input of KG. Heatmaps h(x), Offsets O(x) and Distance D(x) are outputs through conv7Â7, relu and con-v7Â7. Heatmaps are used to commonly represent the position of key nodes in human posture estimation Newell et al. (2016). Similarly, in our KD, heatmap h(x) also represents the possible corners cells bounding boxes, which outputs five channels to represent keypoint categories. To create a heatmap, each channel contain disc d r y ¼ fx : kx À yk rgg, where y and r are the position of the keypoint and radius of the disc respectively, and h(x) ¼ 1 for x 2 d r ðyÞ, otherwise h(x) ¼ 0. Besides, our KD involves downsampling and upsampling to merge different scale receptive field. When we map a location b y n c from the locations y in the heatmap h(x) (n refer to sampling factor), some precision may be lost. Hence, to improve the accuracy prediction, we predict Offsets O(x) with 5Â2 channels to punish the heatmap h(x) locations deviation loss and set multiradius on multi-scale features to supervise 2D positions of a keypoint y from coarse to fine.
here, x k represents the kth key point of the image, B denotes the bilinear interpolation kernel.

Bounding boxes grouping
After obtaining the score keypoint h 0 ðxÞ, we sort the keypoint score map h 0 ðxÞ in descending order and greedily connect the (k, l) pair of keypoints belonging to the same object by the prediction graph distance D k;l ðxÞ. D k;l ðxÞ ¼ ðy l À xÞ; x 2 d r ðy k Þ where k and l represent two key points, respectively. The direction of the connection keypoint is similar to a directed graph. Therefore, there are 10Â2Â2 channels (10Â2 connection types, two direction in the X-and Y-axis). Similar to offset, we use the L1 to punish prediction graph distance D k;l ðxÞ loss. Various groups are obtained by the above connect methods, shown in Figure 3 (BBG). To reduce the possibility of losing box proposals, we select 5-2 diagonal keypoints as a box; then use Diou NMS to inhibit repeated detection for the same object.

CACS module
For cluster cells segmentation, over-and under-segmentation problems often occur in the highly overlapping area. To overcome the drawbacks mentioned above, we propose a CACS module shown in Figure 3 (CACS module), which can be divided into the following steps: (i) Feature extraction. We crop and fuse the multi-scale features from the backbone by the KD detector. These fused features have deep semantic information and shallow boundary information.
(ii) CAC. We propose the CAC module focus on the cell's boundaries in Figure 4. The location information of fused feature X 2 R CÂHÂW will be extracted by the CA attention Hou et al. (2021). Specifically, CA attention using Average Pooling along horizontal and vertical directions encode the fused feature X to capture location information X x 2 R CÂ1ÂW and X y 2 R CÂHÂ1 . Because the cell features are represented by the neighborhood pixels in the local region, we fuse the location information X x and X y to enhance location feature correlation. Then, using 1Â1 Conv and Sigmoid to obtain boundary attention weight. Finally, a multiplication of the attention weight and fused feature can help the network improve the ability to identify overlapping adhesion areas. Besides, the output of the CAC module is cells boundary and cells mask. First, the boundary constraint can further punish the loss of over-and undersegmentation. Second, the influence of boundary constraint is not only reflected in the segmented network but also can further improve the perception keypoints in the process of updating shared network parameters.

Experiment
In this section, extensive experiments have been conducted to demonstrate the effectiveness of the CC framework. We first briefly introduce the experiments datasets, followed by evaluating metrics. Then, we show the implementation and training details. In addition, we provide a summary of the evaluation for our framework. Ablation studies that aim to demonstrate the effectiveness of each component in CC framework are also provided.

Datasets
PETCCD. We constructed a pleural effusion tumor cluster cell dataset (PETCCD). This dataset is divided into two types: discrete pleural effusion tumor cells (DPETC, 68 training samples) and aggregated pleural effusion tumor cluster (APETC, 109 images). To verify the effectiveness of our proposed the CCCE module, we define the synthetic data through the CCCE module as combined enhancement of pleural effusion tumor cluster (CEOPETC,200 training samples). For these datasets, we all use the APETC dataset as a validation dataset (24 images) with this number of samples selected by 109 images and remaining 85 samples for training.
OCWBC. Overlapping cluster of white blood cell (OCWBC) dataset is a public dataset and can be used for cell instance segmentation. It includes two categories: discrete white blood cells (DWBC, 25 training samples), dense OCWBC (DOCWBC, 78 images). we synthesized 250 training samples on DWBC, combined enhancement of OCWBC, (CEOOCWBC). Similar to PETCCD, this validation dataset (24 images) is selected by DOCWBC, remaining 54 samples for training.
In the training process, we resize the equal aspect ratio of the input network image to 512Â512. In addition, data enhancement methods, such as random expanding, clipping, flipping, contrast distortion, are used to increase the sample diversity and mitigate model overfitting. GT boxes are used as bounding boxes to train the CACS module. Further, we set a total of 200 epochs. The first 100 epochs module freeze boundary prediction and the last 100 epochs unfreeze boundary prediction. In the testing process, the input image is detected to obtain bounding boxes, which are then mapped to the detected bounding boxes to the CACS module for instance segmentation cells. lr is set to 0.0001. The batch size is set to 4. Diou NMS and segmentation threshold is set to 0.5.

Boundary segmentation evaluation metrics
In the instance segmentation task, most papers take AP mask as the metrics to evaluate their algorithms. AP mask calculates precision and recall curve through Mask IoU (intersection-over-Union). When compared with the pixel level IoU of semantic segmentation, the response of Mask IoU to the boundary quality of objects with different scales is uneven. The segmentation quality of prediction boundary pixels cannot be accurately evaluated. Based on the above analysis, Boundary IoU (Cheng et al., 2021) is proposed, which is sensitive to objects of multiple scales and will not punish anyone excessively. Mask IoU and boundary IoU are calculated as follows (4 and 5).
where G m and P m respectively refer to the gt and predicted mask, G d and P d are the sets of all pixels within d pixels distance from the G m and P m contours respectively. We choose AP mask and AP boundary TP as segmentation quantitative evaluation criteria. AP mask is the standard evaluation metrics for instance segmentation (Lin et al., 2014). AP boundary TP is employed to calculate the positive sample (averaged over Boundary IoU thresholds), which can eliminate the influence of negative sample detection object on boundary segmentation evaluation to improve the segmentation evaluation standard.

Experimental results
We compare the proposed CC framework with state-of-art algorithms on CEOPETC and CEOOCWBC datasets. It can be observed from Table 1 that our method has achieved the more than 90% results on AP mask , much better than other algorithms. For the comparison of AP boundary , although only by calculating positive samples, our method still is a reduction of 2.74-1.0% on the highest RefineMask, and the overall results still outperform other instance segmentation algorithms. Besides, to visually compare our method with some competitive algorithms, partial better segmentation qualitative results are shown in Figures 5 and 6. For the PETCCD and the OCWBC datasets, most comparison algorithms suffer from huge multi-instance object confusion recognition (False detection) and loss of prediction target (Missed detection), especially on overlap and adhesion region. Similarly, these algorithms are unable to differentiate obscure cell boundaries, which may lead to over-and undersegmentation for clusters cell pixel. On the contrary, compared with these SOTA methods, our framework exhibit remarkable performance in capturing tiny and fuzzy boundary information of clusters cells, barely false detection, missed detection and oversegmentation.
In addition, to evaluate the effectiveness of the CCCE module, We compare the results on the synthesized datasets (CEOPETC, CEOPETC) with those of original datasets, as shown in Table 2. It can be immediately noticed that the segmentation results of synthesized datasets outperform discretely cell datasets (DPETC, DWBC), even higher than the aggregate cell dataset (APETC, CEOOCWBC) in some metrics. The experimental results demonstrate the significant advantages of the CCCE module in segmenting the cluster cells with scarce data.

Ablation experiment
In this section, to justify the effectiveness of the CAC module, we performed ablation experiments on PETCCD and OCWBC datasets. Table 3 shows the results of the ablation experiment. We can notice that after adding the CAC module to the baseline, the results of PETCCD and CEOOCWB both increase by 0.74-3.38% on the AP mask . Simultaneously, the results on AP boundary TP also increased by 0.01-0.79%. Therefore, this demonstrates that the proposed CAC module can further restrict the segmentation boundary and improve the expression ability of keypoint features.

Discussion
As can be seen from Table 1 and Figures 5 and 6, the anchor methods, such as Cascade Mask R-CNN, RefineMask etc, tends to missed and false detection on the overlap and adhesion regions. One possible reason would be that this method is difficult to select the appropriate initial anchor size and quantity to match each cell in the area of the intensive cells. Therefore, the offset of predicted bounding boxes will be caused by regression anchor. At the same time, in the keypoint detection algorithm, the accuracy of AP is directly related to the location and the connection mode of the keypoints. For ANCIS, keypoints position deviate gt boxes, so the bounding boxes connected by these key points cannot accurately cover each cluster cell. Compare these methods, our KD-based multi-scale feature fusion and boundary constraints can improve the expression ability of the keypoints feature. Following two to five keypoints combination and DIoU NMS, the detector can effectively filter redundant bounding boxes, retaining high score predicted bounding boxes. The quantitative results show that our detector can effectively avoid missed and false detection. In addition, the boundary of cluster cells in out-of-focus images tends to be blurry, relying on regressing mask is not enough to extract the detailed features of fuzzy boundary. Therefore, these comparison algorithms tend to be over-and under-segmentation. Thus we introduce the CAC module that can encode the detailed boundary feature and punish the boundary loss. From Table 3 and Figures 5 and 6, we can observe that our methods with the CAC module perform better than SOTA algorithms.

Conclusion
Accurately segmenting the cluster cells can help clinicians pay more attention to the lesion area and achieve a more accurate diagnosis of lung cancer. In this article, we propose a novel CC framework, which leverages the cluster cell combination enhance for more efficient miscellaneous cells feature representation. Quantitative and qualitative results demonstrate the advantages of our method in segmenting the instances cluster cells. In the future, we plan to construct a multi-category, sample imbalanced cluster cell dataset, which is more consistent with clinical medical pathological images. In addition, we will pay more attention to semi-supervised method to reduce the complexity of labeling data as well as improving model ability to adaptively learn cluster properties features.